Identifying MAGs with the most expression,looks like roughly the top 15 MAGs are of interest. Cyanos and proteobacteria, cutoff roughly at 1000tpm for the entire year expression (very little), so the most signifcant contributions come from 6 MAGS essentially. Bin 236 most prominent, which is the aphanizomenon. Also two proteobacteria (pelagibacteraceae) are part of the more prominent mags.
Interesting to see here is the extra peak that the autotrophic has as compared to the heterotrophic community in the phosphonate metabolism. This would be very interesting to see in the whole community data, as there are only a few mags responsible for this expression (hardly representative for a big picture overview). It appears that the autotrophic and heterotrophic expression is occuring in tandem.
Filtering step. Plotting the cyanobacterial MAGs we can see that there is essentially only two cyanobacterial MAGs of interest, bin236 and bin109, which is Aphanizomenon and N. spumigena respectively. The proteobacteria has only two interesting mags from these plots, which are both pelagibacteraceae.
Seeing these dynamics, it is also important to see how the MAGs themselves are behaving by looking at their total expression activity. This shows that the Aphanizomenon bin is active more or less all year round, while the Spumigena is active mainly in summer, which explains the pulses that appears on the P-met, it is doing everything at the same time it appears. For the Aphanizomenon it appears that the phosphonate metabolism occurs mostly outside of winter, which is also what we would expect. #Fig 7 and 8
Another question that is interesting is if the expression from the aphani correlates with the abundance of Nostophycaeae(activity and expression correlation). Which does not show a clear relationship (tpm divided by 2000 to match the scale of the biomass.) Here I will also add in the Spumigena.
Investigating the proteobacteria belonging to the family pelagibacteraceae, phosphonate genes seems to have peaks in late spring/early summer. The first appears to show a succesion, with phosphonate expression in spring/summer, followed by phosphatase expression in summer/autumn, and finally Pi uptake in winter, indicating that this MAG is able to metabolise both organic and inorganic P.
#Fig 12, 13
This figure shows that there appears to be seasonality indeed for the p-gene expression, and that it is recurring regardless of the different years (very nice). PCA is based on all P-genes, it could be an idea to later split them up by their categories and look at them.
Correlation of genes with other parameters - Unfortunately the correlation of Pi with the genes gives very low correlations, (best is 0.), on the other hand there are some that seems to correlate with nitrogen, possibly have to do with the fact that a MAG may be very productive when N is abundant, and as a result there is an increase in the gene expression of these genes as well, as N might be the delimiting factor, sign of N limitation(?). Here the correlation is done for the Aphanizomenon. In the NMDS, the Julian category refers to the julian day of the year, eg. 1-365.
## Call: rda(formula = aphani.wide.hellinger ~ Chla_Average +
## Nitrate_Average + Phosphate_Average, data = lmo_date_prep, na.action =
## na.exclude)
##
## Inertia Proportion Rank
## Total 0.04900 1.00000
## Constrained 0.02064 0.42131 3
## Unconstrained 0.02836 0.57869 8
## Inertia is variance
## 2 observations deleted due to missingness
##
## Eigenvalues for constrained axes:
## RDA1 RDA2 RDA3
## 0.017599 0.002145 0.000900
##
## Eigenvalues for unconstrained axes:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
## 0.010462 0.008564 0.004124 0.002322 0.001457 0.001155 0.000198 0.000074
## Permutation test for rda under reduced model
## Forward tests for axes
## Permutation: free
## Number of permutations: 999
##
## Model: rda(formula = aphani.wide.hellinger ~ Chla_Average + Nitrate_Average + Phosphate_Average, data = lmo_date_prep, na.action = na.exclude)
## Df Variance F Pr(>F)
## RDA1 1 0.0175991 16.7570 0.001 ***
## RDA2 1 0.0021453 2.0427 0.257
## RDA3 1 0.0009003 0.8572 0.499
## Residual 27 0.0283568
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
| Variable | rho | pValue |
|---|---|---|
| Temperature_C | -0.828673575761407 | 4.72919345312975e-09 |
| cDOM_Average | -0.146341467855824 | 0.44875044738141 |
| Chla_Average | -0.348054050230524 | 0.0550190614327223 |
| DOC_Average | -0.645552608505255 | 0.000492136931040992 |
| Nitrate_Average | 0.569202576016967 | 0.000674312797015418 |
| Phosphate_Average | 1 | 0 |
| ko:K01077 | -0.254722175565038 | 0.159448301155494 |
| ko:K02036 | -0.0504170870208821 | 0.784062115461511 |
| ko:K02039 | -0.0690327392052558 | 0.707348273615104 |
| ko:K02040 | 0.388084332827027 | 0.0281759731160455 |
| ko:K02041 | -0.258206496181118 | 0.153622798531812 |
| ko:K02044 | -0.404399640158914 | 0.0216952694245981 |
| ko:K06217 | 0.476441712764375 | 0.00583918989148335 |
| ko:K07636 | -0.143222543088269 | 0.434215031999715 |
Same thing as above, but for the proteobacteri, nothing correlated well here either, not that It will make much of a difference, but perhaps group the genes here as well and look at the potentially stronger (or worse) correlation.
## Call: rda(formula = pelagi1.wide.hellinger ~ Chla_Average +
## Nitrate_Average + Phosphate_Average, data = env_variables, na.action =
## na.exclude)
##
## Inertia Proportion Rank
## Total 0.23713 1.00000
## Constrained 0.03427 0.14453 3
## Unconstrained 0.20286 0.85547 8
## Inertia is variance
## 2 observations deleted due to missingness
##
## Eigenvalues for constrained axes:
## RDA1 RDA2 RDA3
## 0.023329 0.010617 0.000326
##
## Eigenvalues for unconstrained axes:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
## 0.11617 0.05189 0.02069 0.00634 0.00297 0.00216 0.00156 0.00109
## Permutation test for rda under reduced model
## Forward tests for axes
## Permutation: free
## Number of permutations: 999
##
## Model: rda(formula = pelagi1.wide.hellinger ~ Chla_Average + Nitrate_Average + Phosphate_Average, data = env_variables, na.action = na.exclude)
## Df Variance F Pr(>F)
## RDA1 1 0.023329 2.8750 0.294
## RDA2 1 0.010617 1.3084 0.582
## RDA3 1 0.000326 0.0402 1.000
## Residual 25 0.202859
| Variable | rho | pValue |
|---|---|---|
| Temperature_C | -0.818424566088117 | 3.28970185291561e-08 |
| cDOM_Average | -0.225309115737528 | 0.258498515499418 |
| Chla_Average | -0.37516935867772 | 0.0449194002608719 |
| DOC_Average | -0.594855469229526 | 0.00275425824525371 |
| Nitrate_Average | 0.538435869390334 | 0.00214450278151747 |
| Phosphate_Average | 1 | 1.73407283747191e-216 |
| ko:K02036 | -0.118011079127091 | 0.534544194618961 |
| ko:K02038 | -0.30120885408807 | 0.105769212981798 |
| ko:K02039 | -0.289583895919957 | 0.120616324477844 |
| ko:K02040 | 0.232654087247159 | 0.21600738789824 |
| ko:K02041 | -0.354402674695823 | 0.0546617010178555 |
| ko:K02044 | 0.0865204305067192 | 0.649390086211261 |
| ko:K06217 | -0.39158738853315 | 0.0323571526687303 |
| ko:K19670 | -0.428802068268948 | 0.0180614789864436 |
#Other ideas, I want to correlate the P-related genes in the mags with other genes in them to see if any correlate tigthly, suggesting potential relationship, another wariant would be a network analysis of the CDS.